Natural language processing for under-resourced languages: Developing a Welsh natural language toolkit
نویسندگان
چکیده
Language technology is becoming increasingly important across a variety of application domains which have become common place in large, well-resourced languages. However, there danger that small, under-resourced languages are being pushed to the technological margins. Under-resourced face significant challenges delivering underlying language resources necessary support such applications. This paper describes development natural processing toolkit for an language, Cymraeg (Welsh). Rather than creating Welsh Natural Toolkit (WNLT) from scratch, approach involved adapting and enhancing functionality provided other within existing framework making use external where available. begins by introducing GATE NLP framework, was used as platform WNLT. It then each core modules WNLT turn, detailing extensions adaptations required processing. An evaluation reported. Following this, two demonstration applications presented. The first simple text mining analyses wedding announcements. second Twitter application, extends pipeline. As relatively small-scale project, makes possible, rather new resources. adaptation reuse can provide practical achievable route developing
منابع مشابه
VnCoreNLP: A Vietnamese Natural Language Processing Toolkit
We present an easy-to-use and fast toolkit, namely VnCoreNLP—a Java NLP annotation pipeline for Vietnamese. Our VnCoreNLP supports key natural language processing (NLP) tasks including word segmentation, part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing, and obtains state-of-the-art (SOTA) results for these tasks. We release VnCoreNLP to provide rich linguistic...
متن کاملPSI-Toolkit: A Natural Language Processing Pipeline
The paper presents the main ideas and the architecture of the open source PSI-Toolkit, a set of linguistic tools being developed within a project financed by the Polish Ministry of Science and Higher Education. The toolkit is intended for experienced language engineers as well as casual users not having any technological background. The former group of users is delivered a set of libraries that...
متن کاملThe Stanford CoreNLP Natural Language Processing Toolkit
We describe the design and use of the Stanford CoreNLP toolkit, an extensible pipeline that provides core natural language analysis. This toolkit is quite widely used, both in the research NLP community and also among commercial and government users of open source NLP technology. We suggest that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust...
متن کاملNL Assistant: A Toolkit for Developing Natural Language: Applications
We will be demonstrating a toolkit for developing natural language-based applications and two applications. The goals of this toolkit are to reduce development time and cost for natural language based applications by reducing the amount of linguistic and programming work needed. Linguistic work has been reduced by integrating large-scale linguistics resources--Comlex (Grishman, et. al., 1993) a...
متن کاملFudanNLP: A Toolkit for Chinese Natural Language Processing
The growing need for Chinese natural language processing (NLP) is largely in a range of research and commercial applications. However, most of the currently Chinese NLP tools or components still have a wide range of issues need to be further improved and developed. FudanNLP is an open source toolkit for Chinese natural language processing (NLP), which uses statistics-based and rule-based method...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computer Speech & Language
سال: 2022
ISSN: ['1095-8363', '0885-2308']
DOI: https://doi.org/10.1016/j.csl.2021.101311